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Summary. Phenomena with a constrained sample space appear frequently in practice. This 
is the case e.g. with strictly positive data and with compositional data, like percentages and 
the like. If the natural measure of difference is not the absolute one, it is possible to use simple 
algebraic properties to show that it is more convenient to work with a geometry that is not the 
usual Euclidean geometry in real space, and with a measure which is not the usual Lebesgue 
measure, leading to alternative models which better fit the phenomenon under study. The 
general approach is presented and illustrated both on the positive real line and on the D-part 
simplex. 

Keywords: Lognormal;Additive logistic normal. 
1. Introduction 

In general, any statistical analysis is performed assuming data to be realisations of real 
random vectors whose density functions are defined with respect to the Lebesgue mea- 
sure, which is a natural measure in real space and compatible with its inner vector space 
structure. Sometimes, like in the case of observations measured in percentages, random 
vectors are defined on a constrained sample space, E C M. D , and methods and concepts 
used in real space lead to absurd results, as it is well known from examples like the 



spurious correlations between proportions ([Pearson, 1 897). This problem can be circum 



vent ed when E admits a meaningf ul Euclidean space structure different from the usual 



one (jPawlowskv and Egozcue. 20011 ). In fact, if E is an Euclidean space, a measure Xe, 
compatib le with its structure, is obtained from the Lebesgue measure on orthonormal coor- 
dinates ( Eaton. 1983t IPawlowskv-Glahn. 20031) . Then, a probability density function, / , 



is defined on E as the Radom-Nikodym derivative of a probability measure P with respect 
to Xe- The measure Xe has properties comparable to those of the Lebesgue measure in 
real space. Difficulties, arising from the fact that the inte gral P(A) = f. /■ B (x)dAg(x) is 
not an ordinary one, are solved working with coordinates ( Eaton, 19831). and in particula r 



working with coordinates with respect to an orthonormal basis ([Pawlowskv-Glahn. 20031) . 
as properties that hold in the space of coordinates transfer directly to the space E. For 
example, for f E a density function on E, call / the density function of the coordinates, and 
then the probability of an event A C E is computed as P(A) — J y /(v) dX(v), where V and 
v are the representation of A and x in terms of the orthonormal coordinates chosen, and 
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A is the Lebesgue measure in the space of coordinates. Using / to compute any element of 
the sample space, e.g. the expected value, the coordinates of this element with respect to 
the same orthonormal basis are obtained. The corresponding element in E is then given by 
the representation of the element in the basis. 

Every one-to-one transformation between a set E and real space induces a real Eu- 
clidean space structure in E, with associated measure A^. Particularly interesting are 
those tran sformations re lated to the measure of difference between observations, as evi- 
denced by iGalton fl879h when introducing the logarithmic transformation as a mean to 
ac knowledge Fechne r's law, according to which perception equals log (stimulus), formalised 
bv lMcAlister Q879I) . 

This simple approach has acquired a growing importance in applications, since it has 
been recognised that many constrained sample spaces, which are subs ets of some real space — 
like K + or the simplex — can be structured as Euclidean vector spaces (|Pawlowskv and Egozcue. 20011 ) . 
It is important to emphasise that this approach implies using a measure which is different 
from the usual Lebesgue measure. Its advantage is that it opens the door to alternative 
statistical models depending not only on the assumed distribution, but also on the measure 
which is considered as appropriate or natural for the studied phenomenon, thus enhancing 
interpretation. The idea of using not only the appropriate space structure, but also to change 
the measure, is a powerful tool because it leads to results coherent with the interpretation 
of the measure of difference, and because they are mathematically more straightforward. 



2. Probability densities in Euclidean vector spaces 

Let E C MP be the sample space for a random vector X, i.e. each realization of X is in 
E. Assume that there exists a one-to-one differenciable mapping h : E -> R d with d < D. 
This mapping allows to define a Euclidean structure on E just translating the standard 
properties of M. d into E. The existence of the mapping h implies some characteristics of E. 
An important one in this context is that E must have some border set so that h transforms 
neighborhoods of this border into neighborhoods of infinity in M. d . For instance, a sphere 
in R 3 with a defined pole can be transformed into K 2 , but, if no pole is defined, this is no 
longer possible. 

The inner sum and the outer product in E are defined as x © y = /i _1 (/i(x) + 
h(y)) , a x = h^ 1 (a ■ h(x)), where x, y arc in E and a 6 K. With these defini- 
tions E is a vector space of dimension d. The metric structure is induced by the in- 
ner product (x,y)g = (h(x), h(y)), which implies the norm and the distance ||x||£: = 
||/i(x)|| , ds(x,y) = d(h(x), h(y)), thus completing the Euclidean structure of E, based on 
the inner product , norm and distance in K d , denoted as (•,•), 1 1 • 1 1 , d ( • , • ) respectively. By con- 
struction, h(x) is the vector of coordinates of x 6 E. The coordinates correspond to the or- 
thonormal basis in E given by the images of the canonical basis in M. d by ft. -1 . The Lebesgue 
measure in R d , A^ induces a measure in E, denoted Xe, just defining XE(h~ 1 (B)) = Xd(B), 
for any Borelian B in M. d . 

In order to define pdf's in E, a reference measure is needed. When E is viewed as a 
subset of U. D , the Lebesgue measure, A_d, can be eventually used. However, if d < D the 
random vector X cannot be absolutely continuous with respect to Xjj . Our proposal, and a 
more natural way to define a pdf for X, is to start with a pdf for the (random) coordinates 
Y = /i(X) in R d . Assume that /y is the pdf of Y with respect to the Lebesgue measure, Ad, 
in R d , i.e. Y is absolutely continuous with respect to Ad and the pdf is the Radom-Nikodym 
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derivative fy — dP/dXd- 

The random vector X is recovered from Y as X = h -1 (Y) but, when D > d, h^ 1 can be 
restricted to only d of its components; let hj 1 be such a restriction and X^ = /i ( ^ 1 (Y). The 
inverse mapping is denoted by /id(Xd) = /i(X). This means that more than d components 
result in a redundant definition of X. When D = d, the restriction of h^ 1 reduces to the 
identity h^ 1 — /1T 1 . 



The pdf of X^ with respect to the Lebesgue measure in 
Jacobian rule 



/x d (x d ) 



dP 
dXd 



dh d {yL d ) 



is computed using the 



(1) 



where the last term is the d-dimensional Jacobian of hd- 

The next step is to express the pdf with respect to A^;, the natural measure in the sample 
space E. The chain rule for Radom-Nikodym derivatives implies 



E dP dP d\ d , . 

Jx d ( x d) = -TT- (Xd) = TT-(xd) • -TT— (Xd) 
d\E d\d uae 



and the last derivative is 

d\ d 



dX 



E 



(Xd) 



dh- d \hd{*d)) 




dh d (x d ) 


dy 




dxd 



due to the inverse function theorem. Substituting ^ and ([3]) into (JTJ) , 

dP 



/x(x) 



dX, 



-(x) = / Y (Mx)) 



(2) 



(3) 



(4) 



where the subscripts d have been suppressed because they only play a role when computing 
the Jacobians. 

The representation of random variables by pdf's defined with respect to the measure 
Xe requires a review of the moments and other characteristics of the pdf's. Following 
Eaton (19831 ). the expectation and variance of X can be defined as follows. Let X be a 
random variable supported on E and h : E — > M d the coordinate function defined on E. 
The expectation in E is 



E*[X| = 



/ x /|(x) dx = /i- 1 ( / y/ h(X )(y)dy 

^(E[MX)]) , 



(5) 
(6) 



provided the integrals exist in the Lebesgue sense. This definition deserves some remarks. 
The first integral in ([5]) has been superscripted with © because the involved sum is © for 
elements in E. The practical way to carry out the integral is to represent the elements of E 
using coordinates and to integrate using the pdf of the coordinates; the result is transformed 
back into E. Finally, ([B]) summarizes the previous equation using the standard definition of 
expectation of the coordinates in R d . 

Variance involves only real expectations and can be identified with variance of coor- 
dinates. Special attention dese rves the metric variance or total variance ( Aitchison. 1986t 



Pawlowskv and Egozcue. 20011 ). Assuming the existence of the integrals, metric variability 
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of X with respect to a point zgE can be defined as Var[X, z] = E[d|>(X, z)]. The minimum 
metric variability is attained for z = E B [X], thus supporting the definition ([5])~([6]). The 
metric variance is then 

Var[X] = E[d|(X,E s [X])] . (7) 

The mode of a pdf is normally defined as its maximum value, although local maxima 
are normally also called modes. However, the shape and, particularly, the maximum values 
depend on the reference measure taken in the Radom-Nikodym derivatives of the density. 
Since the Lebesgue measure in the coordinate space, R d , corresponds to the measure A^, 
the mode can be defined as 

Mode B [X] = argmax xeB {/f (x)} = h^ 1 (argmax yeR<i {/, l( x)(y)}) , 

where the usual remarks on multiple modes or asymptotes are in order. 



3. The positive real line 

The real line, with the ordinary sum and product by scalars, has a vector space struc- 
ture. The ordinary inner product and the Euclidean distance are compatible with these 
operations. But this geometry is not suitable for the positive real line. Confront, for ex- 
ample, some meteorologists with two pairs of samples taken at two rain gauges, {5; 10} 
and {100; 105} in mm, and ask for the difference; quite probably, in the first case they will 
say there was double the total rain in the second gauge compared to the first, while in the 
second case they will say it rained a lot but approximately the same. They are assuming a 
relative measure of difference. As a result, the natural measure of difference is not the usual 
Euclidean one and the ordinary vector space structure of M does not behave suitably. In 
fact, problems might appear shifting a positive number (vector) by a negative real number 
(vector); or multiplying a positive number (vector) by an arbitrary real number (scalar), 
because results can be outside R + . 



There are two operations, ©, ©, which induce a vector space structure in R + ([Pawlowskv and Egozcue. 20011 ) 
In fact, given x,y € R + , the internal operation, which plays an analogous role to addition 
in R, is the usual product x © y — x ■ y and, for a £ R, the external operation, which 
plays an analogous role to the product by scalars in R, is a x — x a . An inner product, 
compatible with and is = lax ■ lny, which induces a norm, ||x|| + = |hix|, 

and a distance, d+(x,y) — \ hiy — liix\, and thus the complete Euclidean space structure 
in R+. Since R+ is a 1-dimensional vector space there are only two orthonormal basis: 
the unit-vector (e) and its inverse element with respect to the internal operation (e" 1 ). 
From now on the first option is considered and it will be denoted by e. Any x € R+ can 
be expressed as x = \nx e = e lnx which reveals that h(x) = \nx is the coordinate of 
x with respect to the basis e. The measure in R + can be defined so that, for an interval 



'a,b) C R+, X+(a,b) = A (lna,lnb) = | ln6-lna| and d\ + /d\ = l/x (|Mateu-Figueras. 20031: 



Pawlowskv-Glahn, 2003h . Following the notation in Section [5J all these definitions can be 
obtained by setting E = R + , D = d = 1 and h{x) — hix. The generalization to E — R^ is 
straightforward: for x G R^, the coordinate function can be defined as /i(x) = hi(x), where 
the logarithm applies component-wise. 



3. 1. The normal distribution on R+ 

Using the algebraic -geometric structure in R+ an d the measure A+ , the normal distribution 
on R+ is defined by lMateu-Figueras et.at. (2002) through the density function of orthonor- 
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mal coordinates. 

Definition 1. Let be (il,J-,P) a probability space. A random variable X : il — > M. + is 
said to have a normal on K + distribution with two parameters \i and <r 2 , written A/+(/i, a 2 ), 
if its density function is 

„, , . dP , N 1 / 1 (\nx- a) 2 \ 

= ^(*) = -^«p(--i^-J ) (8) 

The density |(5J) is the usual normal density applied to coordinates In x as implied by Q and 
it is a density in K + with respect to the A + measure. This density function is completely 
restricted to and its expression corresponds to the law of frequency introduced by 
McAlister (18790 . The continuous line in Fig [1] represents the density function (jSJ) for fj, = 
and a 2 = 1. 




Fig. 1. Density functions A(0, 1) ( ) and JV+(0, 1) ( ). 

According to this approach, the normal distribution in R + exhibits the same charac- 
teristics as the normal distribution in R, the most relevant of which are summarized in 
the following properties. A complete proof of the following properties is presented in the 
appendix. 

Property 1. Let be X ~ A/+(/i, a 2 ), and constants a £ R + and del. Then, the random 
variable X* = a (6 X) = a ■ X b is distributed as A/+(lna + bfi, b 2 a 2 ). 

Property 2. Let be I ~ M+{p,a 2 ) and a £ M + . Then, f^x( a ® x ) — fx( x )^ w ^ eTe 
fx and fa^x represent the probability density functions of the random variables X and 
a X = a ■ X , respectively. 

Property 3. If A - a 2 ), then E+[A] = Med+[A] = Mode + [A] = e». 

Property 4. If X - A/"+(m 5 f 72 ). then Var[A] = a 2 . 

Notice that Property 1 implies that the family A/+(/x, a 2 ) is closed under the operations 
in M + and Property 2 asserts the invariance under translations in K + . 

The expected value, the median and the mode are elements of the support space R+, 
but the variance is only a numerical value which describes the dispersion of X. We are 
used to take the square root of a 2 as a way to represent intervals centered at the mean 
and with radius equal to some standard deviations. To obtain such an interval centered at 
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E[X] = with length 2k a take (e^- fc<T e ^ +fcCT ) as d+(e^ fcCT , e^ +ka ) = 2ka. This kind of 
interval is used in practice (jAhrens. 19541 ) and predictive intervals in R + taking exponential 
of predictive intervals computed from the log-transformed data under the hypothesis of 
normality are obtained. In Fig^a) we represent the interval (e M_<T , e M+<T ) for a Af+(n,a 2 ) 
density function with /i = and a 2 = 1. It can be shown that it is of minimum length, 
and it is also an isodensity interval thus, the distribution is symmetric around e M . This 
symmetry might seem paradoxical, as one cannot see it in the shape of the density function. 
But still, it is symmetric within the linear vector space structure of R+ , although certainly 
not within the Euclidean space structure of R+ as a subset of R. 

An important aspect of this approach is that consistent estimators and exact confidence 
intervals for the expected value are easy to obtain. We have only to take exponentials 
of those obtained from normal theory using log-transformed data, i.e. the coordinates 
with respect to the orthonormal basis. Thus, let be x\, X2, ■ ■ ■ , x n a random sample and 
Di = hxXi for i — l,2,...,n, then the optimal estimator for the mean of a normal in 
R + population is the geometric mean (x\X2 ■ ■ ■ Xnf-I n that equals to e v . An exact (1 — 
a)100% confidence interval for the mean is (e s_ *°/ 2y / v '™, e s ~*°/ 2V / v") where V denotes the 
logarithmic variance. 




3.2. Normal on R+ vs log normal 

The lognormal distribution has long been recognized as a useful model in the evaluation 
of random phenomena whose distribution is positive and skew, and specially when deal- 
ing with measurements in which the random errors a re multiplicativ e rather than additive. 
The history of this distribution starts in 1879, when iGalton ri879h observed that the law 
of "frequency of errors" was incorrect in many groups of vital and social phenomena. This 
observation was based on Fechner's law which, in its approximate and simplest form, is 
"sensation=log(stimulus)" . According to this law, an error of the same magnitude in excess 
or in deficiency (in the absolute sense) is not equally probable; therefore, he proposed the 
geometric mean as a measure of the most probable value in stead of the arithmetic mean. 
This remark was followed by the memoir of McAlister (18791 ) , where a mathematical inves- 
tigation concluding with the lognormal distribution is performed. He proposed a practical 
and easy method for the treatment of a data set grouped around its geometric mean: "con- 
vert the observations into logarithms and treat the transformed data set as a series round 
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its arithmetic mean" , and introduced a density function called the "law of frequency" which 
is the normal density function applied to the log-transformed variable i.e. density l|8|). In 
order to compute probabilities in given intervals, he introduced also the "law of facility", 
nowadays known as the lognormal density function. 

A unified treatment of the lognormal theory is presented bv | Aitchison and Brown (1957 ) 
and more recent developments are compiled bv lCrow and Shimizu f!988l ). A great number 
of authors use the lognormal model from an applied point of view. Their approach assumes 
R+ to be a subset of the real line with the usual Euclidean geometry. This is how everybody 
understands the sentence "an error of the same magnitude in excess or in deficiency" in the 
same way. One might ask oneself why there is much to say about the lognormal distribution 
if the data analysis can be referred to the intensively studied normal distribution by taking 
logarithms. One of the generally accepted reasons is that parameter estimates are biased if 
obtained from the inverse transformation. 

Recall that a positive random variable X is said to be lognormally distributed with two 
parameters [i and a 2 if Y = In A is normally distributed with mean /i and variance a 2 . We 
write X ~ A(/z,cr 2 ) and its probability density function is 

[ x < 0. 

Comparing (J9j) with ([8]), we find some subtle differences. In fact, the expression of the 
lognormal density (J9j) includes a case for the zero and for the negative values of the random 
variable. This fact is paradoxical, because the lognormal model is completely restricted to 
R+ . It is forced by the fact that R + is considered as a subset of R with the same structure 
and, consequently, the variable is assumed to be a real random variable, hence the name 
"lognormal distribution in R". Another difference lies in the coefficient 1/x, the Jacobian, 
which is necessary to work with real analysis in R. More obvious differences are that (J9j) 
is not invariant under translations, that it is not symmetric around the mean, and that 
E[X] = e^+2 ff , while Med [A] = d l , and both arc different from the mode. The dashed 
line in FigQ] illustrates the probability density function ([9]) for /i = and a 2 — 1. Observe 
that it differs from the density function ([8]) plotted in continuous line. 

However, we can also find some coincidences between the two models. The median 
of a A(/i,er 2 ) model is equal to the median of a JV+(fi,a 2 ) model. The same happens 
with any percentile and any value that involves the distribution function in its calculation. 
This property can be easily shown using measure theory, in particular using properties of 
integration with respect to the adequate measure. In fact, given a lognormal distributed 
variable X with parameters /i and a 2 , the probability of any interval (a, b) with < a < b 
is 

The same probability could be computed using the normal in R + model. Remember 
that in this case we work in the coordinates space, thus the probability of any interval (o, b) 
is 

P(a<X<b)= / —^exp I -I (h^LLt) )d\{\nx). 

Jlna V27TO- \ 2 V ° J J 

Obviously the same result is obtained in both cases. Therefore we conclude that the 
lognormal and the normal in R + are the same probability law over R + . 
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As we have made for the normal in M+ case, we could represent an interval centered 
at the mean and with radius equal to some standard deviations for the lognormal in R. If 
we consider R + as a subset of R with an Euclidean structure, these intervals are: (E[X] — 
fcStdev[A], E[A] + fcStdev[A]). But it has no sense, because the lower bound might take 
a negative value. For example, for ±i = and a 2 = 1, the above interval with k = 1 is 
(—0.512, 3.810). This is the reason why sometimes intervals (e AI ~ fe<T , e A1+fe<T ) are used, which 
are considered to be "non-optimal" because they are neither isodensity intervals, nor do they 
have minimum length. In FigJSJb) we represent the interval (e M_CT , e M+CT ) for the A(//, a 2 ) 
density function with \i = and a 2 = 1. It is clear that in the bounds of the interval the 
density function takes different values. 

Consistent estimators and exact confidence intervals for the mean and the variance of a 
lognormal variable are diffic ult t o compute. Early method of estimating are summarised in 
Aitchison and Brown (19571 ) and lCrow and Shimizu (1988). Certainly we find in the litera- 



ture and extensive number of procedures and discussions. It is not the objective of this paper 
to summarise all methods and to provide a complete set of formulas. But in general we could 
say that for the mean, the term multiplied by a term expressed as an infinite serie or tab- 
ulated in a set of tables is obtained in most cases (lAitchison and Brown. 1957 : Krige. 1981 



Clark and Harper. 2000h . For example, in IClark and Harper (200Clh the Sichel's optimal 



estimator for the mean of a lognormal population is used. This estimator is obtained as 
e x 7, where 7 is a bias correction factor depending on the variance and the size of the data 
set and tabulated in a set of tables. A simil ar bias correction factor is used to obtain 
confidence intervals on the population mean (|Clark and Harper. 2000l ). Nevertheless, in 
practical situations, the geometric mean or is used to represent th e mean and in some 
cases also to re present the mode of a log normal distributed variable ( Herdan. 19 60). But 
as adverted by ICrow and Shimizu (19881 ) those affirmations cannot be justified using the 



lognormal theory. On the contrary, using the normal in R+ approach those affirmations are 
completely justified. 



3.3. Example 

The importance of using the normal in R + instead of the lognormal in R can be best 
appreciated in practice. 

In order to compare the classical lognormal estimators with those obtained by the nor- 
mal in R + approach, we have simulated 300 samples representing siz es of oil fields in thou- 
sands of barrels, a geological variable often lognormally m odeled (Davis. 19861). Using 
the classical lognormal procedures and table A2 provided in lAitchison and Brown (19571 ) 
we obtai n 161.96 as an estimate for the mean. Afterwards and using tables 1,2 and 3 



given in iKrige (19811 ) we obtain 162.00 and (150.31,176.78) as an estimate and approxi- 



mate 90% confidence int erval for the mean. Also, using tables 7, 8(b) and 8(e) provided in 
Clark and Harper (2000) we could apply the Sichel's bias correction and we obtain 161.86 



and (144.07, 188.39) as the optimal estimator and confidence interval for the mean in the 
context of the lognormal approach. 

Using the normal in R + approach we easily obtain 145.04 as the estimate for the mean 
and (138.70,151.68) as the exact 90% confidence interval for the mean. We have only to 
take exponentials of the mean and the 90% confidence interval obtained from normal theory 
using log-transformed data. As can be observed, the differences from those obtained using 
the lognormal approach are important. With the normal in R+ a much more conservative 
result is obtained. 
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In order to compare graphically the normal in E + and the lognormal approaches we can 
represent the histogram with the corresponding fitted densities. In Fig[3ja) and [3jb) the 
histogram with the fitted lognormal and normal in R + densities are provided. Note that 
the intervals of the histogram are of equal length in both cases, as the absolute Euclidean 
distance is used in (a) and the relative distance in R + , d+ , is used in (b) to compute them. 
Thus, (b) is a classical histogram but considering the structure defined in Section 2. Finally, 
in FigfJ] the histogram of the logtransformed data or equivalently of the coordinates with 
respect to the orthonormal basis with the fitted normal density is provided. This last figure 
is adequate using both methodologies but in this case we have chosen exactly the same 
intervals as in Fig|3jb) . This is only possible using the normal in R+ approach because the 
intervals on the positive real line have the corresponding intervals in the space of coordinates. 

The normal on R + model and its properties has been recently applied in a spatial context 
and the re sults have seen compared with those obtained wi th the classical lognormal kriging 
approach (jTolosana-Delgado and Pawlowskv-Glahn. 20071) . Using the proposed model and 
methodology, the problems of non-optimality, robustness and preservation of distribution 
disappear. 




(a) 



(b) 



Fig. 3. Simulated sample n = 300. Histogram with (a) the fitted lognormal density and (b) with the 
fitted normal in R + density. 




Fig. 4. Simulated sample n = 300. Histogram of the logtransformed sample with the fitted normal 
density. 
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4. The simplex 

Compositional data are parts of some whole which give only relative information. Typical 
examples are parts per unit, percentages, ppm, and the like. Their sample space is the 
simplex, S D = {x = (x 1 ,x 2 , ■ ■ ■ ,xd)' ■ X\ > 0, x 2 > 0,. . .,xp > 0; YaLi x i = K i' 



where the prime stands for transpose and k is a constant (jAitchison. 19821 ). For vectors of 

proportions which do not sum to a constant, always a fill up value can be obtained. 

The simplex S D has a (D— l )-dimensional Euclidean space structure (jBillheimer et. al.. 2001 



Pawlowskv and Egozcue. 200lh with the following operations. Le t C(-) denote th e clo- 



sure operation which normalises any vector x to a constant sum (jAitchison. 19821) . and 



let be x,x* e S D , and a e M. Then, the inner sum, called perturbation, is defined 
as x x* = C (xixl, X2X2, ■ ■ ■ ,XdX* d ) \ the outer product, called powering, is defined as 
a x = C(xf, x 2 , ■ ■ ■ , %¥))'', and the inner product is defined as 

(x,x*) a = l^ln^ln4. (10) 
D i Xj x* 

i<] J 3 

The associated squared distance is d^(x, x*) = (1/D) Sj<j ( m ( ^iZ^l) _ \Mx*/x*)) 2 . This 
distance is relative and satisfies standard properties of a distance f M artin- Fernandez et. al.. 19981) . 
i.e. d a (x, x*) = d Q (a © x, a x*) and d a (a x, a x*) =| a | d Q (x, x*). The geometry 
here defined is known as Aitchison geometry, and therefore the subindex a is used. 

The inner product (j!0[) and its associated norm, ||x|| a = y/ (x, x) a , ensure the existence 
of orthonormal basis {ei, e 2 , ■ ■ ■ , er)—i}, which lead to a unique expression of a composition 
x as a linear combination, 

x = ((x, ei) a ei) ((x, e 2 ) a e 2 ) ... © «x, er>_i) e D _i). 

Like in every inner product space, the orthonormal basis is not unique. It is not straight- 
forward to determine which one is the most appropriate to solve a specific problem, but a 



promising strategy, based on binary partitions, has been devel oped by Egozcue and Pawlowskv (20051 ) 
Here, whenever a specific basis is needed, the basis given in Egozcue et. al. (20031) is used 
with respect to which the coordinates of any x S S D are 

1 / Xi Xo ■ ■ ■ x, \ , 
Vl = 7=== In -) , i=l,2,.. .,£>-!. (11) 



The coordinates in this particular basis are denoted ilr(x) to emphasise the similarity with 
the vector obtained applying the isometric l og-ratio transformatio n to a composition x, 
which is a transformation from S D to K D_1 ( Egozcue et. al.. 2 003). The important point 



is that, once an orthonormal basis has been chosen, all standard statistical methods can be 
applied to the coordinates and transferred to the simplex preserving their properties. 

As stated in Section [21 the Lebesgue measure in the space of coordinates induces a 
measure in S D , denoted here as A a - This measure is absolutely continuous with respect 
to the Lebesgue measure on real space, and the relationship between them is \d\ a /d\\ = 
(VD xix 2 ■ ■ -xd)^ 1 . 

Following the notation in Section [2l all these definitions can be obtained by setting 
E = S D and d = D - 1. 

For later use, the concept of subcomposition is required. For C < D, a C-part subcom- 
position, X5, from a D-part composition, x, can be obtained as xs = C(Sx), where S is a 
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C x D selection matrix with C elements equal to 1 (one in each row and at most one in each 
column) and the remaining elements equal to ( Aitchison, 19861 ). A subcomposition can 
be regarded as a composition in a simplex with fewer parts, and thus as a space of lower 
dimension. 



4. 1. Some basic statistical concepts in the simplex 

A random composition X is a random vector with S D as domain. In the literature laws 
of probability over S D can be found, defined using the standard methodology, i.e. using 
the Lebesgue measure. Consequently, the probabilities or any moment are computed using 
the classical definition. But some usual elements appear to be of little use when work- 
ing with real situations. One typical example is the expected value which appears as not 
representative as a measure of location. As an alternative, the geometric interpretation 
of the expected value has been used to define the centre, cen[X], of a r andom composi- 
tion as that composition which minimises the expression E[d2(X,cen[X])] (jAitchison. 1997t 



Pawlowskv and Egozcue. 2001I). T he result is cen[X] = C(exp(E[lnX])), which can be 
rewritten as ( Egozcue et. al.. 20031 ) cen[X] = ilr - (E[ilr(X)]), or, in general terms, as 



cen[X] =/i" 1 (E[^(X)]). (12) 

Observe that the centre of a random composition is equal to the expectation in S D 
defined in Section [2] This is an important result because if a law of probability on S D is 
defined using the classical approach, this equality does not hold. 

As already mentioned, traditionally the simplex has been considered as a subset of 
real space and, consequently, the laws of probability have been defined using the standard 
approac h. This is the cas e for families of distributions like the Dirichlet, the additive logisti c 
normal ( Aitchison. 19821 ). the additive logistic skew-normal ( Mateu-Figueras et.at.. 2005h . 



Barcelo-Vidal. 1996t ). Except 



or those defined using the Box-Cox family of transformations 
for the Dirichlet, these laws of probability are defined using transformations from the simplex 
to real space. Two of these transformations will appear later in this paper, the additive 
log-ratio (ah) and the centred log-ratio (clr), 



a M x, = (ta(^j,...,ln(^ijj ,cll(x) = (ta(-^j ,„ 

where g(x) is the geometric mean of c omposition x. T he relationship between the air and 
the clr transformations is provided bv lAitchison (19861) (p. 92). The relat ionships between 
the ah, clr and ilr transformations are provided bv lEgozcue et. al. (20031 ). 



4.2. The normal distribution on SD 

Using the algebraic-geometric structure and the measure A a on S D , the normal distribu- 
tion on S D is defined th rough the density function of generic orthonormal coordinates fe(X) 

( Mateu-Figueras. 2003f h The same strategy is used in lMateu-Figueras and Pawlowskv-Glahn (2007 ) 
to define the skew-normal in S D law. 

Definition 2. Let be (f2, J- ,p) a probability space. A random composition X : — > S D 
is said to have a regular normal on S D distribution, with parameters fi and S, if its density 
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function is 

/£(x) = (2 7 r)-^- 1 )/ 2 |S|- 1 / 2 exp (-± (h(x) n)' ST 1 (*(x) - M )) , (13) 

where /i(.) stands for the generic orthonormal coordinates. 

The notation X ~ JV<?(n, £, a) is used. The subscript S indicates that it is a model on 
the simplex and the superscript D indicates the number of parts of the composition. FigfS] 
sh ows the isodensity curv es of two normal densities on S 3 taking the particular basis given 
bv lEgozcue et. alT (2003) and using a ternary diagram as a convenient and simple way for 
representing 3-part compositions (see lAitchison. 198a p. 6). 

The density (|13[) is the usual normal density applied to coordinates h(x) as implied by 
([4J and it is a density in S D with respect to the A a measure. 

The principal properties of this model follow. A complete proof of each property can be 
found in the appendix. The proofs are straightforward for a reader familiar with composi- 
tional data analysis. 

Property 5. Let be X - Af^((i,H), a e S D and b e R. Then, the D -part random 
composition X* = a © (b x) has a Afg(h(a) + bfi, 6 2 S) distribution. 

Property 6. Let be X - J\f^(fj,,H) and a e S D . Then /f ex (a © x) = / x (x), where 
/x and /agx represent the density functions of the random compositions X and a © X, 
respectively. 

Property 7. Let be X ~ SAfs(fi, S) and Xp = PX, the random composition X with the 
parts reordered by a permutation matrix P. Then Xp ~ J\fg(fi P , Xp) with fi P — U'PU/x, 
Sp = (U'PU)S(U'P'U), where U is a D x (D — 1) matrix with the clr transformation of 
a generic orthonormal basis of S D as columns. 

Property 8. Let be X ~ J\fg{fi, S) and Xs = C(SX), the C-part random subcompo- 
sition obtained from the C x D selection matrix S. Then Xs ~ J\fg Ss), with 
/j s = U*'SU/i,S s = (U*'SU)S(U'S'U*), where U is a D x (£> - 1) matrix with the 
clr transformation of a generic orthonormal basis of S D as columns and U* is a C x (C — 1) 
matrix with the clr transformation of a generic orthonormal basis of S c as columns. 

Property 9. Let be X ~ Afg(fi, £). Then, the expected value in S D is 

E a [X] = (fii ei) (fi 2 e 2 ) © . . . © (p-D-i eu_i). (14) 



Property 10. Let be X ~ Af!?(n, S). The metric variance of X is Var[X] = trace (S). 

From Property 5 we conclude that the normal on S D law is closed under perturbation 
and powering. From Property 6 we see that it is also invariant under perturbation. This 
has import ant consequences, because whe n working with compositional data the centring 
operation ( Martin- Fernandez et. al.. 1999h . a perturbation using the inverse of the centre 



of the data set, is often applied in pr actice to better visualise and interpret the pattern of 
variability (jvon Evnatten et.al.. 2002h . 



Notice that Properties 7 and 8 show that the normal on S D family is closed under 
permutation and subcompositions. 
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Given a compositional data set the estimates of parameters /x and X can be computed 
applying the maximum likelihood procedure to the coordinates of the data set. The es- 
timated values fi and S allow us to compute the estimates of the expected value and 
metric variance of random composition X, as E [X] = (/ii ei) © • • • ® (A-D-i © e D-i) and 
VaTjX] = trace (s) . 

To validate the distributional assumption of normality on S D , some goodness-of-fit tests 
of the multivariate normal distribution have to be applied to the coor dinates of the sa mple 
data set. There is a large battery of possible tests but as suggested bv lAitchison (1986h we 
could start testing the normality of each marginal using empirical distribution function tests. 
Unfortunately, the univariate normality of each component is a necessary but not sufficient 
condition for the normality of the whole vector. Also, these univariate tests depend on the 
orthonormal basis chosen. This difficulty does not depend on the proposed methodology, 
as the same problem appears when workin g with laws of probab ility defined using transfor- 
mations and the Lebesgue measure in S D ijAitchison et.al.. 2003I) . The multivariate normal 



model can also be validated considering the Mahalanobis distance (/i(X)— /t)'S _1 (ft,(X) — jj.) 
which is sampled from a x^^-distribution if the fitted model is appropriate. In this case, 
the dependence on the chosen orthonormal basis disappear s. Here, the use of empirical 
distribution function tests is also suggested ( Aitchison. 1986h . 




4.3. The normal on SD vs the add itive logistic no rmal 

The classical approach is used by I Aitchison (1982[ ) to define the additive logistic normal 
law on the simplex. The strategy is standard: transform the random composition from the 
simplex to the real space, define the density function of the transformed vector and finally 
go back to the simplex using the theorem of the change of variable. The result is a density 
function for the initial random composition with respect to the Lebesgue measure. Thus, a 
random composition is said to have an additive logistic normal distribution (aln) when the 
additive log-ratio transformed vector has a normal distribution. Note that this definition 
does not explicitly state that the theorem of the change of variable has to be used. But this 
is the principal difference between this approach, based on working with transformations, 
with the new approach, based on working with coordinates. 
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The aln model was initially defined using the additive lo, 



the matrix relationship among the log-ratio transformations (jEgozcue et. al. 



ratio transformation . Using 
2003I ) we can 



easily obtain the density function in terms of the isometric log-ratio transformation. Thus we 
can define the logistic normal distribution with parameters fi and S, with density function: 



1 



(n \-(D-l)/2 I y 1-1/2 

/x(x) = { -J- 7 = LJ eX p (ilr(x) - M )' 5T 1 (ilr(x) - fi 



(15) 



To easily comp are both approaches w e will use the normal model on the simplex taking 
the basis given in lEgozcue et. al. f2003l ) and consequently the ilr vector stated in (fTTj) . 
Nevertheless, we could consider any orthonormal basis as we can obtain vector ilr(x) from 
/i(x) and the corresponding change of basis matrix. If we compare the expression of the 
densities (fT5]l and (fT5]) . the only difference is the term (\fDx\X2 ■ ■ -xd)^ 1 , the jacobian of 
the isometric log-ratio transformation that reflects the change of the measure on S D . The 
influence of this term can be observed in the isodensity curves in Fig(6j These curves can be 
directly compared with the curves in FigJS] The differences are obvious, in particular the 
trimodality in Fig[6ja). This behaviour is not exclusive of the logistic normal model, we find 
also bimodality with Beta or Dirichlet densities when their parameters tend to and when 
the Lebesgue measure is considered. In Figjfjjb) we observe a unique mode, nevertheless 
its position and the shape of the curves are not the same as in FigfS^b), the corresponding 
normal on 1S 3 . 





(a) (b) 
Fig. 6. Isodensity plots of two logistic normal models with (a) /x = (0, 0), (b) /* = (— 1, 1) and £ = Id. 



Another essential difference between the two models are the moments of any order. We 
know that the expression of the density function plays a fundamental role when any moment 
is computed. The density (|15[) is a classical density, consequently we compute any moment 
using the standard definition. Obviously the results are not the same as in the normal on 
S D case. For example, the expected valu e of an aln model exists, but numerical procedures 



have to be applied (see Aitchison. 19861 p. 116) to find it and the result is not the same as 
in Property 9. 

Also, some coincidences can be found. The closure under perturbati on, powering, per - 
mutation and subcompositions of the logistic normal model is proved bv lAitchison (198a) . 
the same as those stated in Properties 5,7 and 8 for the normal on S D model. Nevertheless 
the logistic normal class is not invariant under perturbation, that is, / a ex(a0 x) ^ /x(x). 
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Another coincidence is that the two models assign the same probability to the events 
and we can say that both models arc equivalent on S D . In fact, given a logistic normal 
distributed random composition X with parameters fi and S, the probability of any event 
A C S D is 

P{A) = jf l^/l X ^ Y l D X/2 ex P i~\ OM*) - A 4 )' S- 1 (Ur(x) - M )) dA(x), (16) 

where now the vector ilr(x) denotes the isometric log-ratio transformation of vector x. The 
same probability using the normal on S D model is 

P(A) = J ma) |s|1/2 { l )iD - 1)/2 exp (-i (v - M )' E- (v - „)) d A(v), (17) 

with ilr(A) giving now the represent ation of event A in co ordinates with respect to the 
particular orthonormal basis given bv lEgozcue et. al. (20031 ). At this point it is important 
to correctly interpret the vector ilr(x) as the isometric log-ratio transformed vector or as the 
vector of coordinates. Therefore, to avoid possible confusions, we denote by v the vector 
of coordinates in expression (|17p . Certainly, the two vectors are numerically identical, but 
here the meaning is important. 

Both expressions (fT6"|) and (jXTJ) are standard integrals of a real valued function. Thus, we 
can apply a change ofvariable in (|17|) . takingv = ilr(x) whose jacobian is {-\fDx\X2 • • -xd) 1 
and the equality 

p(A) = J f^Tff-^ cxp (~l ( ilr w - s ~ 1 ( ilr ^ - **)) dA w 

is obtained. This equality agrees with (TTB]) given that ilr _1 (ilr(A)) = A. Remember that 
an isometric log-ratio tran sformed element is eq ual to its coordinates with respect to the 
orthonormal basis given byi gozcue et. al. f2003l ). Then, the ilr 1 transformation gives the 
original element on the simplex. Therefore we conclude that the additive logistic normal 
law and the normal on S D law are the same probability law over the simplex. 

Concerning estimation and goodness-of-fit testing, we will obtain exactly the same re- 
sults using both models. Remember that in the normal in S D case we work with the ilr 
coordinates whereas in the logistic normal case we work with the ilr transformed vector. 

In summary, the essential differences between both approaches are the shape of the 
probability density function, in some cases leading to multimodality for the standard ap- 
proach; the moments which characterise the density, particularly important in practice for 
the expected value and the variance; and invariance under perturbation. 



4.4. Example 

To illustrate the differences between using a density with respect to t he Lebesgue measure A 
or a density with respect to the measure A a in S D , the Skye lavas data ( Thompson et.al.. 19721 ) 
will be used. It contains chemical compositions of 23 basalt specimens fro m the Isle of Skye 
in the form of percentages of 10 major oxides. This data set is used in lAitchison (T982I ) 
to discuss the adequacy of some parametric models and no significant indication of non- 
normality is obtained for the air transformed data set. Due to the matrix relationship 
between the air and ilr transformations, we can easily conclude no significant departure 
from normality for the ilr transformed data set. 
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Our objective in this section is to compare graphically the logistic normal and the 
normal on the simplex. Thus, in order to provide some useful figures a 3-part compositional 
data set is preferred. For this reason we take X as the popular AFM subcomposition (A: 
Na 2 + K 2 0, F : Fe 2 Oz and M: MqO) from the Skye lavas da ta set. The resulting data 
can be found in Aitchison (1986) or in iThompson et.al. (T972f ) . As the first component 



is obtained amalgamating two original parts, we cannot guarantee the adequacy of the 
logistic normal and the normal in S 3 models. Following the suggestions bv lAitchison (T986T ) 
we could test the goodness-of-fit of the model applying a battery of 12 tests, based on the 
Anderson-Darling, Cramer- von Mises and Watson statistics, to the coordinates of the sample 
data set. In particular, the tests are applied to the marginal distributions, to the bivariate 
angle distribution and to the radius. Taking a 1 per cent significance level only one of the 
marginal tests gives evidence of any departure from normality. 

The fit of a normal model on S 3 and of a logistic normal model (using the ilr transforma- 
tion) gives, as noted in the previous section, exactly the same estimates of the parameters 
for both models: 

M= (0.555, 0.639), ^ = [ _ Q 22Q QAm 

Here, the orthonormal basis given bv lEgozcue et. aL has been used, and consequently 

the ilr vector stated in (fTTj) . 

The fit of the logistic normal and normal in S 3 models are represented in dashed line in 
FigLTKa) andJTJb) respectively. The two fitted models are quite similar. 



A MA M 

(a) (b) 

Fig. 7. Skye lavas data (•) and linearly transformed data (*) with isodensity curves of the fitted 
(a) logistic normal model and (b) normal on S 3 . 

As both models follow Property 5, i.e. the families are closed under perturbation and 
powering, the transformation a® (6©X) is applied to the data, with a = g(X) -1 and b = \/3. 
This is a linear transformation in S 3 and has been chosen only for illustration purposes. Note 
that the geometric mean of our resulting data set is the center of the simplex, composition 
(1/3,1/3,1/3), because we first modify the variability applying the power operation but 
then we center our data. It is equivalent to translate the transformed data set, or the 
coordinates with respect to an orthonormal basis, to the origin of coordinates in the real 
space. For both resulting models the estimates of the parameters follow the equations stated 
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in Property 5 i.e. 

jx = (0.000, 0.000)', s = 



0.377 -0.688 
-0.688 1.369 



In FiglTJa) and [T^b) the logistic normal and the normal in S 3 fitted models are rep- 
resented in continuous line. As can be observed, the same linear transformation leads to 
a better visualisation of the normal on S 3 fitted model, but in the logistic normal case a 
completely different model, with two modes, is obtained. In other words, perturbation and 
powering, which should only move the centre of the density and modify the variability, can 
generate arbitrary modes, an undesirable property. In Fig|5]we represent the corresponding 
normal densities fitted to the ilr coordinates or equivalently to the ilr transformed data 
set, because the same graphic is obtained using both methodologies. It is clear that the 
linear transformation only increase the variability and translate our data set to the origin 
of coordinates. 




Fig. 8. ilr coordinates of the Skye lavas data set and the corresponding fitted normal models to the 
original data (dashed line) and to the linear transformed data (continuous line). 



5. Conclusions 

A particular Euclidean vector space structure of the positive real line and of the simplex, 
together with the associated measure, allow us to define parametric models with desirable 
properties. Normal models on R + and on S D have been defined through their densities 
over the coordinates with respect to an orthonormal basis and their main algebraic prop- 
erties have been studied. From a probabilistic point of view, those laws of probability are 
identical to the lognormal and to the additive logistic normal distribution defined using the 
Lebesgue measure and the standard methodology based on transformations. Nevertheless, 
some differences are obtained in the moments and in the shape of the density function. In 
particular, the expected value differs from what would be obtained with the lognormal and 
with the additive logistic normal distributions, something important when they are used to 
characterise real data using a probabilistic model. 
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APPENDIX 

This appendix contains the proofs of properties contained in Sections 13.11 and 14.21 The 
construction of these proofs is done using the expected value, the covariance matrix, the 
linear transformation property of the multivariate normal distribution and some matrix 
relationships among vectors of coordinates and among log-ratio transformations. 

Proof of Property 1. The coordinates of the random variable X* are obtained from the 
coordinates of the variable X as \n(X*) = ln(a) + b\n(X). The density function of hx(X) 
is the classical normal density in the real line; thus, the linear transformation property 
can be used to obtain the density function of the ln(A*) random variable. Therefore, 
X* ~ jV + (lna + bfi, b 2 a 2 ). 

Proof of Property 2. From Property 1 we know that a © X = a ■ X ~ A/+(lna + fi,cr 2 ). 
From © we get 



Proof of Property 3. From ((6|) we known that E E [A] = exp((E[ln X]), and from (JSj) we 
known that the density function of In X is the normal distribution, thus E S LY] = exp(/i). 
The same result is obtained for the median and the mode as the normal distribution is 
symmetric around its expected value pi. 

Proof of Property 4- From ([7| we know that the variance can be understood as the expected 
value of the squared distance around its expected value, i.e. Var[A] = 



E[d^_(X, E + [A])]. Working on coordinates and using the density function of InX we obtain 
Var[X] = E[d 2 (lnX,E[lnX])] = Var[lnA] = a 2 . 



Proof of Property 5. The orthonormal coordinates of the random composition X* are 
obtained from the orthonormal coordinates of the composition X via h(X*) = /i(a) + 6/i(X). 
The density function of h(X) random vector is the classical normal density in real space; 
thus, the linear transformation property can be used to obtain the density function of the 
h(X*) random vector. Therefore, X* - A/"j>(/i(a) + bfi, 6 2 S, a). 

Proof of Property 6. Using Property 5, afflX ~ A r J 3 (/i(a) + fi, S, a). We known that 
h(a x) = h(a) + /i(x), therefore, 




/aex(aex) = (2 7 r)-^- 1 )/ 2 |S|- 1 /2 



x exp -- (/i(affix) - (h(a) + ju))'S 1 (h(a ® x) - (h(a) + n)) = 
= ^tt)-^- 1 )/ 2 I S T 1 / 2 exp Li (ft(x) - /*)' S- 1 (h(x) - n)] = jx(x). 
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Proof of Property 7. For the centered log-rat io transformed vectors it is straightforward to 
see that clr(Xp) = Pclr(X) ( Aitchison. 1986I . p. 9 4) . Using the matrix re lationship between 



the centered and the isometric log-ratio vectors (jEgozcue et. al.. 2003I ) we conclude that 
h(X.p) = (U'PU)ft(X). Given the density of the /i(X) random vector, and applying the 
linear transformation property of the normal distribution in real space, a AT® (ftp, Sp, cxp) 
distribution is obtained for the random composition Xp. 

Proof of Property 8. ( Aitchison. 19861 p. 119) gives the matrix relationship between alr(Xg) 



and alr(X). Usin g the matrix relation ships between the additive, centered and isometric 



ng 

log-ratio vectors (jEgozcue et. al.. 20031 ). we conclude that h(X s ) = (U*'SU)/i(X). Given 



the density of the /i(X) vector, and applying the linear transformation property of the 
normal distribution in real space, the density of the h(Ks) vector is obtained as that of the 
Afg(Hg, Eg, ols) distribution. 

Proof of Property 9. From © we known that E Q [X] = h^ 1 (E[h(X.)]), and from Defini- 
tion 2 we know that the density function of /i(X) is the multivariate normal distribution; 
thus E[/i(X)] = fi. Finally, the composition E n [X] is obtained applying hT x or by the 
representation of this element in the basis (fix ei) © (^2 e 2) © ■ ■ ■ © (md-i ec_i). 

Proof of Property 10. From ([7]) we know that the variance can be understood as the expected 
value of the squared distance around its expected value, i.e. Var[X] = E[d|(X, E [X])]. 
Working on coordinates and using the density function of /i(X) we obtain Var[X] = trace(E). 



